LIMIT THEOREMS FOR SOME ADAPTIVE MCMC ALGORITHMS 
WITH SUBGEOMETRIC KERNELS: PART II 
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Abstract. We prove a central limit theorem for a general class of adaptive Markov 
Chain Monte Carlo algorithms driven by sub-geometrically ergodic Markov kernels. We 
discuss in detail the special case of stochastic approximation. We use the result to analyze 
the asymptotic behavior of an adaptive version of the Metropolis Adjusted Langevin 
algorithm with a heavy tailed target density. 

1. Introduction 



This work is a sequel of lAtchade and Fortl (|2008l ) and develops central limit theo- 
rems for adaptive MCMC (A MC MC) algorithms. Previous wo rks on the subject include 



Andrieu and Moulined (|2006l ) and 



Saksman and Viholal (|2009l ) where central limit theo 



rems are proved for certain AMCMC algorithms driven by geometrically ergodic Markov 
kernels. There is a need to understand the sub-geometric case. Indeed, many Markov 
kernels routinely used in practice are not geometrically ergodic. For example, if the 
target distribution of interest has heavy tails, then the Random Walk Metropolis al- 
gorithm (RWMA) and the M etropolis Adjusted Langevin algorithm (MALA) result in 
sub-geometric Markov kernels (jJarner and Roberts! (|2002al )). 

We consider adaptive MCMC algorithms driven by Markov kernels {Pe, 6 € 0} such 
that each kernel Pg enjoys a polynomial rate of convergence towards 7r and satisfies a drift 
condition of the form PqV < V — cV l ~ a + b for some a € (0, 1] (uniformly in 9 over 
compact sets). We obtain a central limit theorem when a < 1/2 under some additional 
stability conditions. This result is very close to what can be proved for M arkov chains un- 
der similar conditions. Indeed, it is known (jJarner and Roberts! ([2002 q )) that irreducible 
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and aperiodic Markov chains for which the drift condition PV < V — cV 1 "" + blc hold 
for some small set C satisfy a central limit theorem when a < 1/2. The slight loss of effi- 
ciency in our case (a < 1/2 versus a < 1/2) is typical of martingale approximation-based 
proofs. The proof of the central limit theor e m is based on a martingale approx i matio n 
technique initiated by iKipnis and Varadhanl (| 19861 ) and iMaxwell and Woodroofei (j2000l ) . 
The method is a Poisson equation-type method but where the Poisson's kernel is replaced 
by a more genera l resol vent kernel. We have used a variant of the same technique in 
Atchade and Fortl (|2008l ) to study the strong law of large numbers for AMCMC. 

Adaptive MCMC has been studied in a number of recent papers. Beside the above 
mentioned papers, results related to t he convergence of marginal dis t ribu t ions a nd the law 
of large numbers can be found e.g. in ([Rosenthal and Roberts! (|2007D ; LBail (120081) ). For spe- 



cific examples and a review of the m e thodological developm ents, see e.g 



Roberts and Rosenthal 



(|2006h : lAndrieu and Thornd (j2008T ): 



Atchade et al 



(|2009l ) 



The rest of the paper is organized as follows. The main CLT result is presented in 
Section T2.31 Adaptive MCMC driven by stochastic approximation is considered in Section 
2.61 To illustrate, we apply our theory to an adaptive version of the Metropolis adjusted 
Langevin algorithm (MALA) with a heavy tailed target distribution (Section 12. 7p . Most 
of the proofs are postponed to Section [3J 



2. Statement of the results 

2.1. Notations. We start with some notations that will be used through the paper. For 
a transition kernel P on a measurable general state space (T, £>(T)), denote by P n , n > 0, 
its n-th iterate defined as 



def 



P u (x,A) = 5 X {A) 



P n+1 (x,A) 



def 



P(x,dy)P n (y,A) , n>0; 



5 x (dt) stands for the Dirac mass at {x}. P n is a transition kernel on (T, B(T)) that acts 
both on bounded measurable functions / on T and on u-finite measures on (T, £>(T)) 
via P n f(-) = J P n (-, dy)f(y) and ^P n {-) = J fi(dx)P n (x, ■). 

If V : T — > [1, +oo) is a function, the l/-norm of a function / : T — > M. is defined as 
|/|v = f sup T |/|/V. When V = 1, this is the supremum norm. The set of functions with 
finite y-norm is denoted by Ly. 

If fj, is a signed measure on a measurable space (T, B(T)), the total variation norm 
1 1// 1 1 tv is defined as 



|//||tv d = sup \fi{f)\ 
{/,l/li<i} 



2 sup \v(A)\ 

AeB(T) 



sup fJ-(A) — inf (J,(A) ; 
Aet3(T) AeS(T) 



def 

and the V-norm, where V : T — * [1, +oo) is a function, is defined as ||^||y = supj 3) | 3 | v<1 } 
Observe that || • ||tv corresponds to || • \\v with V = 1. 
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In the Euclidean space R n , we use (a, b) to denote the inner product and \a\ = \J (a, a) 
the Euclidean norm. We denote R the set of real numbers and N the set of nonnegative 
integers. 

2.2. Adaptive MCMC: definition. Let X be a general state space resp. endowed with 
a countably generated u-field X. Let O be an open subspace of R 9 the g-dimensional 
Euclidean space and £>(0) is its Borel a-algebra. Let {Pg,9 £ 0} be a family of Markov 
transition kernels on (X, X) such that for any (x, A) £ X x X, 9 i— > Pg(x, A) is measurable. 
We assume that for any 9 £ 0, the Markov kernel -Pg admits an invariant distribution tt. 
Let {K n ,n > 0} be a family of nonempty compact subspaces of such that K n C K n +i. 
Let n : X x — > Xo x ©o be a measurable function, the so-called re-projection function, 
where Xo x ©o is some measurable subset of X x 0. We assume that Tl(x,9) = (x,6) if 
9 £ ©o- For an integer k > we define Hk(x,6) = H(x,9) if k = and Hk(x, 6) = (x,6) 
if k > 1. Let -R(n; •, •) : (X x 0) x (A? x i3(0)) — > [0, 1] a sequence of Markov kernels on 
X x with the following property. For any n > 0, A £ X, (x, 9) £ X x 

fl(n;(i,«),4x9) = P ( (i,A). (1) 

In most cases in practice, the adaptation is driven by stochastic approximation. One 
such example of stochastic approximation is obtained by taking R(n;-,-) of the form 
R(n;(x,6),(dx',d9')) = PQ(x,dx')6g + y n y g r x '\(dO'). But the main example of stochastic 
approximation considered in this paper is 

R(n;(x,9),(dx',d9')) = j (x,dy)qf ) ((x,y),dx) 5 e+ ^ e{Xiy) (d9'). 

where and q@ are Markov kernels. Obviously, in order for (JTJ) to hold, these kernels 
ought to satisfy the constraint 

/,?Wf ((,.»),,*) = *<*.<*)■ 

Throughout the paper and without further mention, we assume that (H|) hold. We are 
interested in the Markov chain {(X n ,9 n , 

v n ,(,n), n > 0} define onXxOxNxN with 

transition kernel P, 



P (Or, 9, v, £), (dx, d0', dv',d£')) = R(u + & IL^x, 9), (da/, d9')) 

x (l{9'eK„}i(^)%i(^) + l { o>tK v }6 v +i(dv')5 (di')) . (2) 
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Algorithmically, this Markov chain can be described as follows. 

Algorithm 2.1. Given (X n ,9 n ,v n ,£ n ): 

a: generate (X n+1 , n+1 ) ~ R (v n + £ n ; Il^ n (X n , 9 n ), ■); 
b: ifO n+1 G K Un then set u n+1 = u n , £ n+i = £ n + 1, 
c: if 6 n+ i K Un then set v n+ i = v n + l and £ n+ i = 0. 

We denote by P^e,^ an d M x q v £ the probability and expectation operator when the 
initial distribution of the Markov chain is £( x ,0,i/,£)- Throughout the paper, we will assume 
that the initial state of the process is fixed to (xo,9o,0, 0) for some arbitrary element 
(xq,6o) 6 Xj x 0o and we will systematically write IP and E instead of Pa; o ,0 o ,o,o anc ^ 
Exo, 0o,o,o respectively. 

Remark 1 . Algorithm 12.11 is fairly general and encompasses the two main strategies used 
in practice to control the adaptation parameter. 

(1) For example, one obtains t he framework o f re-p r ojections on randomly varyin g 
compact sets developed in (jAndrieu et al,l (|2005l ); lAndrieu and Moulinesl (|2006l )) 
by taking {K„,n > 0} such that 6 = (J n K n , Go C Ko and K n C int(K n+ i), where 
int(yl) is the interior of A. 

(2) But we can also set @q = = K for all k > for some compact subset K of O. 
And we then obtain another commonly used approach where the re- p rojection is 
done on a fixed compact set K. See e.g. Atchade and Rosenthall ( 2005 ). 

Let {f-'n, n > 0} denote the natural filtration of the Markov chain {(X n , 9 n , v n ,£,n), n > 
0}. It is easy to compute using ([1]) that for any bounded measurable function / : X — > M, 

t{f(X n+1 )\3F n )t {!;n>Q] =P e J(X n ), P-a.s. (3) 

Equation ([3]) together with the strong Markov property are the two main properties of the 
process {{X n , 9 n , u n , £ n ), n > 0} that will used in the sequel. 

We now introduce another stochastic process closely related to the adaptive chain 
defined above. For I > an integer, we consider the nonhomogeneous Markov chain 
{(X n , 9 n ), n > 0} with initial distribution 5 Xt g and sequence of transition Markov kernels 

Pi (n; (x 1 ,9 1 ),(dx',d9')) = R(l + n;(x 1 ,9 1 ),(dx',d9')) . 

Its distribution and expectation operator are denoted respectively by and \. We 
will denote {J- n , n > 0} its natural filtration (for convenience in the notations, we omit 
its dependence on (x,9,l)). Again it follows from ([T]) that for any bounded measurable 
function / : X — ► R, 

E® (f(X n+1 )\T n ) = Psf(X n ), F® - a.s. (4) 
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For K a compact subset of 0, we define the stopping time 7~k (wrt the nonhomogeneous 
Markov chain {(X n ,9 n ), n > 0}) as 

r K = inf{A; > 1 : 9 k $ K}, 

with the usual convention that inf = oo. Clearly the two processes defined above are 
closely related. We will refer to {(X n ,9 n ), n > 0} as the re-projection free process. The 
general strategy that w e adopt to study the Markov chain {(I n ,6„,f n ,^), n > 0} (a 
strategy borrowed from 



Andrieuetal 



(|2OO50 ) consists in first studying the re-projection 
free process {(X n ,9 n ), n > 0} and showing that the former process inherits the limit 
behavior of the latter. 

2.3. General results. The main assumption of the paper is the following. 

Al There exist a G (0, 1], and a measurable function V : X — ► [1, oo), sup,,, g x V(x) < 
oo with the following properties. For any compact subset K of 0, there exists 
6, c G (0, oo) (that depend on K) such that for any (x, 9) G X x K, 

P e V(x) <V{x)-cV 1 - a (x) + b (5) 

and for any G [0,1 -a], k E ^oT^ 1 - (3) - 1], there exists C = C(V,k,0,K) 
such that 



(n+l) K \\P?(x,-) -tt(-)\\ v0 <CVP +aK (x), n>0. 



(6) 



Notice that ([5]) implies that -wiV 1 a ) < oo. We will also assume that the number of 
re-projection is finite. 



A2 



supf n < oo 

7l>0 



1. 



(7) 



We introduce a new pseudo-metric on 6. For (3 € [0, 1], 9,9' € 0, set 

n fa a'\ dcf \ p ef{x) - Pe>f(x)\ 
Dp{9,9) = sup sup — g-— . 

\f\ v0 <i xex VP(x) 
Under A[T] and A[2] a weak law of large numbers hold. 

Theorem 2.1. Assume A\^J^ Let f3 G [0, 1 — a) and fg : X — ► R a family of measurable 
functions of L vt 3 such that ir(f$) = 0, 9 — > fg{x) is measurable and sup5) gK |/e|y/3 < oo for 
any compact subset K of 0. Suppose also that there exist e > 0, k > 0, (3 + ok < 1 — a 
suc/i t/iai /or any (x, 9, 1) G Xq x 0q X N 



E 



(0 



fc>i 



< oo. (8) 



T/ien n J^fc = i fe k _ 1 (Xi s ) converges in ^-probability to zero. 
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Proof. The proof is given in Section 13.51 



□ 



Rema rk 2. A strong law of large numbers also hold under similar assumptions (jAtchade and Fort 



(]2008l )). It is an open problem whether ATTJ A[2]and dS|) imply a weak law of large numbers 
hold for measurable functions / for which 7r(|/|) < oo without the additional assumption 
that / G hyp , < (3 < 1 - a. 

For the Central limit theorem, we introduce few additional notations. For / 6 Ly/3 with 
7r(/) = 0, and a G [0, 1/2] we introduce the resolvent functions 

g a (x,6) = ^T(l-ay +1 p3f(x). 

Whenever g a is well defined it satisfies the approximate Poisson equation 

f(x) = (1 - a)-^ (x, B) - P e g a (x, 9). (9) 

When a = 0, we write g(x, 9) which is the usual solution to the Poisson equation f(x) = 
g(x,6) — Pgg(x,6). Define also 

H a (x,y)=g a (y,9)-P e g a (x,9), (10) 

where Pog a (x, 9) = f f Pg(x, dz)g a (z, 9). We start by showing that under ATTj-ASl the partial 
sum Y2k=i f{Xk) admits a martingale approximation. 

Theorem 2.2. Assume AH\-J^ with a < 1/2. Let [3 G [0, | — a) and f G Lyp such that 
7r(/) = 0. Let k > 1, 5 G (0, 1) be such that 2(5 + a(n + 5) < 1 — a. Take p G (i, a- n d 
let {a n , n > 0} be any sequence of positive numbers such that a n G (0,1/2], a n oc n~ p . 
Suppose that for any (x, 6, b, I) G Xq x 0q x [0, 1 — a] x N 



E 

X 



(I) 



<oo. (11) 



X)l { r K >* } *" 1+pM A(^,^ 1 )V 5 ^C'« + ^(X fc ) 
fe>l 

T/ien 

lim n^ 1 / 2 {fi X k) ~ Ha^e^X^Xk)) = 0, in ¥ -probability. 
fc=i 

Proof. We show in Lemma 13.81 that the same martingale approximation hold for the re- 
projection free process {(X n , 9 n ), n > 0} and this property transfers to the adaptive chain 
{(X n , 9 n ,u n ,^ n ), n > 0} as a consequence of Lemma f3.12i □ 

The process Yl^=i n ~ 1 ^ 2 H an: e j _ 1 (Xj-i, Xj)l^ jl>0 y, 1 < k < n is a martingale array 
but do not satisfy a CLT in general. To derive a CLT we strengthen AEJ 

A3 There exists a 0-valued random variable 9* such that with IP-probability one, 
{9 n , n > 0} remains in a compact set and \im n — Dp(6 n , 6+) = for any (3 G 
[0,1 - a]. 
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Notice that the compact set referred to in AS] is sample path dependent. 

Theorem 2.3. Assume 421 o,nd 47J with a < 1/2. Let (3 E [0, \ — a), f € Lyp, k, 5, p and 
{a n , n > 0} as in Theorem \2.2\ Suppose that the diminishing adaptation condition All]) 
hold and 

1 n 

}^ o -J29 2 a n ( X k^k-l)-P0 k . 1 9l n {X k ,9 k - 1 ) = O, m P '-probability. (12) 



n 

k=l 



Then there exists a nonnegative random variable o~1{f) such that re" 1 / 2 Y2k=i f(Xk) con 
verges weakly to a random variable Z with characteristic function cp(t) = E exp ^— t 2 l 
Moreover 

^(/) = y K(dx) {2f(x)g(x, 9,) - f 2 (x)} , P - a.s. 
Proof. See Section [3761 □ 

2.4. On assumption (1121) . Assumption (|12p is needed to establish the weak law of large 
numbers in the CLT. When {X n , n > 0} is a st ationary Markov chain (1121) aut omati- 



Maxwell and Woodroofd ( 



2000). The 



cally hold. The proof is based on a result due to 
stationarity assumption is not restrictive in the case of Harris recurrent Markov chain. 

Proposition 2.4. Suppose that {X n , n > 0} is a stationary and ergodic Markov chain 
with invariant distribution tt and transition kernel P that satisfies f5]) and (GJ) with a < 
1/2. Let f E L V with (5 E [0, 1/2 - a). Then {UP hold. 

Proof. See Section [3771 □ 

In the general adaptive case, the simplest approach to checking ([12]) is through appro- 
priate moments condition. 

Proposition 2.5. Assume A\^and J^with a < 1/2. Let j3 E [0, \ — a), f E Lyp, K,5,p 
and {a n , n > 0} as in Theorem \2.SX Suppose that there exists e > such that for any 

(x,e,i) e x x e x n 



sup n-V'i 



k=l 



< oo. (13) 



n>l 

Then {!§) hold. 

Proof. See Section [3751 □ 
One can always check f)13|) if a < 1/3 and (5 E [0, 1 — 3a). 

Corollary 2.6. Assume 421 and 47J with a < 1/3. Let j3 E [0, 1 — 3a), / G L v p, K,5,p 
and {a n , n > 0} as in Theorem \2.2\ Suppose that 171]) . Then the conclusion of Theorem 
hold. 
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Proof. If a < 1/3 and we take (5 G [0, 1 — 3a) then we can find e > such that 2(/?+a)+e < 
1 — a and by Proposition 13.41 (ii), Eq. (fT3j) hold. The stated result thus follows from 
Proposition 12.51 and Theorem 12.31 □ 

2.5. Some additional remarks on the assumptions. 

2.5.1. On Assumption AH[ In many cases, A[JJcan be checked by establishing a drift and 
a minorization conditions. For example if uniformly over compact subsets K of 0, Pq 
satisfies a polynomial drift condition of the form PgV < V — cV 1 ~ a + 61c for some small 
set C, a £ (0, 1] and such that t he level sets of V a r e 1-sm all then (|5|) and (|6|) hold. This 
point is thoroughly discussed in lAtchade and Fortl (|2008l ) (Section 2.4 and Appendix A) 
and the references therein. 



Assumption AQ]also ho 
recover the CLT result of 



d for geometrically ergodic M arkov kernels and in this case we 



Andrieu and Moulinesl (J2006|). Indeed, suppose that uniformly 



over compact subsets K of 0, there exist C G X, v a probability measure on (X,X), 
b,e > and A G (0,1) such that v(C) > 0, P 6 (x,-) > ev(-)l c {x) and P e V < XV + bl c . 
Then for any a G (0, 1], P e V < V - (1 - A)F 1_a + b, thus © hold. Moreover by explic it 
convergence bounds for geometrically ergodic Markov chains (see e.g. iBaxendald (120051 )). 
for any [5 G (0, 1] 

supper) -7T(-)||vfl < C p {K)p n p vP{x). 
6»eK 

A fortiori (J6j) hold. Also under the geometric drift condition, if (3 G [0, 1 /2) then we 
can find < a < 1/2 and e > such that 2(/3 + a) + e < 1, and since y 5 -moment of 
geometrically ergodic adaptive MCMC are bounded in n for any 5 G [0, 1), we get (|13|) . 
In this case and assuming (jlip . Theorem 12.31 yie lds a CLT for all functi o ns / G Ly/3 with 
(3 G [0,1/2) which is the same CLT obtained in lAndrieu and Moulinesl (|2006l ) (Theorem 
8). Roughly speaking, assuming (jlip at no extra cost is similar to setting (3 = in their 
theorem) . 

2.5.2. On assumption A^-J^ Assumption A[3] is a natural assumption to make when a 
CLT is sought. Whether A[2]or A[3]hold depends on the adaptation strategies. We show 
below how to check A[3] when the adaptation is driven by stochastic approximation. 

2.5.3. On the diminishing adaptation conditions |2|) and (E2J)- It is well known that adap- 
tive MCMC can fail to converge when to so-called diminishing adaptation condition (which 
embodies the idea that one should adapt less and less with the iterations) does not hold. 
Here, the diminishing adaptation takes the form of conditions ([S]) and pip . Indeed, ([S]) 
and pip cannot hold unless Dp{9 n ,9 n -\) converges to zero in some sense. These condi- 
tions are not difficult to check. Typically Db(6k,0k~i) < C , 7/ c V Ar '(X/ c ) for some positive 
numbers 7^ and rj > 0. then we can check ([8]) or pip using Proposition 13.51 
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2.6. Checking A[3] for AMCMC driven by stochastic approximation. Adaptive 
MCMC is often driven by stochastic approximation. We consider an example of stochastic 
approximation dynamics and show how to check AS! Let {7 n } be a sequence of positive 
numbers. Let : X x X — > [0, 1] and qp : X x X x X — > [0, 1] be two Markov kernels 
such that 

J Qe\x,dy)qf ) ((x,y),dx) = P e (x,dx'). 

Let : 8xXxX^9bea measurable function. For convenience we write &e(x,y) 
instead of &(9,x,y). We consider the adaptive MCMC algorithm with the kernels R are 
given as 

R(n;(x,e),(dx',d6')) = j qf ] (x, dy)qf ) ((x, y), dx') 5 e +^ e (x, y )(d0'). (14) 
Under (|14p . the dynamics on 9 n in algorithm 12.11 can then be written as 

9 n+l = 9 n + lvnHn (h(6 n ) + e% + e^) , on {£ n > 0}, P - a.s. 

where = T 0n (X n )-h(9 n ), e^li = $e n (X n , Y n+1 ) - Y 9n (X n ), where Y n+1 is a random 
variable with conditional distribution q^(X n , •) given f n and where 

T e (x) = j q^ l \x,dy)®e(x,y), and h{9) = j ■n(dx)T e (x). 



Following 



Andrieu et al 



(|2005l ). we assume that 

Bl (1) {K n ,n > 0} is such that = \J n K n , @o C Ko and K n C int(K n+ i), where 
int(A) is the interior of A. 
(2) The function h is a continuous function and there exists a continuously dif- 
ferentiable function w : — > [0, oo) such that 

(a) for any 9 G 9, (Vw(9), h{9)) < 0, the set L d = {9 G 6 : (Vio(0), h(9)) = 
0} is non-empty and the closure of w(L) has an empty interior. 

(b) there exists Mq > such that L U Go C {9 : w(9) < Mq} and for any 
M > M , W M = f {8 ■ w{9) < M} is a compact set. 



For integers p > 0, n > 1 and a compact subset K of 0, we define the random variable 

l 



~(i) 



(2) 



where ef+i = T* (X n ) - /i(0 n ) and 



(x nj F n+1 ) -fqf\x n ,dy)$ Sn (x n ,y 
and where the conditional distribution of l^+i given is g^(A„, •). 

Cn,p(K) is the magnitude of the errors in the stochastic approximatio n. Notice that 



Andrieu et al 



C'n.pfK ) is defined from the re-projection free process. A key result shown by 
(|2005l ) is that when EQ]hold, the convergence of a SA algorit hm depends ma i nly on C njP (K). 
The framework considered here is slightly different from lAndrieu et al. ( 2005 ) but the 



10 Y. F. ATCHADE AND G. FORT 

result still hold. The proof follows the same lines as in lAndrieu et al 
the details. 



(|2005l ) and we omit 



Proposition 2.7. Assume BJ\ lim n 7 n = and ^2 n ^n = °°- Suppose that for any 

M > large enough and for any 5 > 



and for any p > 0, 



Then 43] hold. 



lim sup pjW (Ci iP (W M ) >S) = 0, (15) 
p-* 00 (a;,e)eXoxeo ' 



lim sup P<fJ (C n , P (K p ) > 5) = 0. (16) 
n_>00 (x,e)ex xeo ' 



We now show that (fT5]) - (Tl6]) hold true under AQ] 
Assume that the function T satisfies 

B2 There exists n > 0, 2(rj + a) < 1 such that for any compact subset K of O, 
b 6 [0,1 - a], 0,0' e K, 



supsupV" 2 ^) [ q£\x,y)\*g(x,y)\ 2 <oo, and L> 6 (0, 0') + |T e - Tg,\ V v < C\ 



(17) 



for some finite constant C that depends possibly on K. 

Proposition 2.8. Assume AU\ with a < 1/2 and \14\ ). Suppose that S[J\ and BJ^ ZioW. 
Suppose also that lim n 7 n = and ^ n 7„ = oo and for any p > 0, 

lim ( 7p +n-i " Tp+n)^ 1 - = and (7^ + lk k~ p + 7 fc +P ) < °°, (18) 

n>l 

/or some p £ (0, (1 — a) (77 + a)" 1 - l). T/ien J^hold. 

Proof. See Section [3791 □ 

2.7. Example: Adaptive Langevin algorithms. We illustrate the theory above with 
an application to the Metropolis-adjusted Langevin algorithm (MALA). In this section, 
X is the d-dimensional Euclidean space M d and ir is a positive density on X with respect 
to the Lebesgue (denoted fJ,Leb or dx). The MALA algorithm is an effective Metropolis- 
Hastings algorithm whose proposal kernel is obtained by discretization of the Langevin 
diffusion 

dX t = ^e e V log ir(X t )dt + e e dB t , X = x, 

where 9 6 K is a scale parameter and {Bt, t > 0} a d-dimensional standard Brownian 
motion. Denote qg(x, y) the density of the d-dimensional Gaussian distribution with mean 
bg(x) and covariance matrix e e Id where 

b g (x) = x + ^e 6, Vlog7r(2;). 
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The MALA works as follows. Given X n = x, we propose a new value Y ~ qg(x,-). 
Then with probability atg(X n ,Y), we 'accept Y 1 and set X n+ \ = Y and with probability 
1 — ag(X n ,Y), we 'reject Y' and set X n+ \ = X n . The acceptance probability is given by 

-x{y)qe{y,x) 



a e (x,y) = 1 A 



7r(x)qg(x,y) 



Th e convergence and optima- 



scalin g of MALA is studied in detail in 



Roberts and Tweedie 



(]1996l ) ; iRoberts and Rosenthall (|200ll ) . In practice the performance of this algorithm de- 
pends on the choice of the scale parameter 8. In high-dimensional spaces (and under 
some regularity conditions) it is optimal to set 8 = 0* such that the average acceptance 
probability of the algorithm in stationarity is 0.574. In general, 8* is not available and 
its computation would require a tedious fine-tuning of the sampler. Adaptive MCMC 
provides a straightforward approach to properly scale the algorithm. 

The parameter space is = R. For 8 G B, denote P$ the transition kernel of the MALA 
algorithm with proposal qg. We also introduce the functions 

A e{x) = f / a e (x,y)q d (x,y)fi Leb (dy), a{8) = \ A e (x)it(x)n Leb {dx). 
Jx Jx 

Let {K n , n > 0} be a family of nonempty compact intervals of O such that UK n = R, 

K n C int(K n+ i). Therefore by construction EQ]-(1) hold. Let 6o = {#o} and Xo = {xo} 

for some arbitrary point (xq, 8q) S X x Ko- The re-projection function is n(x, 8) = (xq, 8q) 

for any (x, 8) € X x 0. We also have Il k (x,8) = (x,8) if k > and TL k (x,8) = n(x,6>) if 

k = 0. Obviously many other choices are possible. The adaptive MALA we consider is 

the following. 

Algorithm 2.2. Initialization: Let a be the target acceptance probability (taken as 
0.574J. Choose (X ,8 ) G X x 6 , vo = and £ = 0. 
Iteration: Given {X n ,8 n ,u n ,^ n ): set {X,8) =U^ n (X n ,8 n ). 

a: generate Y n+ \ ~ qg [X, •). With probability ag(X,Y n+ i), set X n+ \ = Y n+ \ 

and with probability 1 — ag(X,Y n+ i), set X n+ i = X. 
b: Compute 

n+ i = 0+ — 1 (a 5 (X, Y n+1 ) - a) . (19) 

c: If8 n+ i € K Un then set u n+1 = v n and = £ n + 1. Otherwise if 8 n+ i £ K„ n 
then set = v n + 1 and £ n +i = 0. 

In this algorithm, the kernel R(n; ■, •) takes the form 

R(n;(x,8),(dx',d8')) = J q e {x,dy) (ag(x,y)5 y (dx') 

+(1 - a e (x,y))5 x (dx')) S^g^idO'), 
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where $> n (9,x,y) = 9 + (n + l)^ 1 (ag(x, y) — a). Thus (fl~4"j) hold. We make the following 
assumption. 

CI a € (0, 1), lim^+oo a(9) = 0, lim _ > _ oo a(6>) = 1. 

Proposition 2.9. Under (JJl the function h(9) = a{9) — a satisfies ET^-(B) with L = {9 G 
R : a(0) = a} and w(9) = J® cosh(u)(a — a{u))du + -K" /or some finite constant K where 
cosh(-u) = (e u + e~ u )/2 is the hyperbolic cosine. 

Proof. See Section I3.1U.11 □ 



We assume that the target density ir is heavy tailed as in iKamatanil (|To appear! ) . 
C2 We assume that ir : M. d — > (0, oo) is of class C 2 and there exists rj > d such that 

limsup (x, Vlog7r(x)} < — rj, lim |Vlog7r(x)| = 0, lim ||V 2 log7r(x)|| = 0, (20) 

|z|^oo \x\-xx> \x\^oo 

where for a matrix A, \\A\\ denotes its Frobenius norm. 



The next proposition is a paraphrase of Theorem 5 of IKamatanil (|To appear! ) 



2W2 



and 



Proposition 2.10. Assume (0 For s G (2, 2 + 77 — d), define V s (x) = (l + \x\ 2 ) 
a = 2/s. Let C be a compact subset o/R d with HLeb{C) > 0. For any compact subset K of 
O, there exists e,c,b G (0, 00), suc/t i/tai 



inf Pflfx, <iw) > e 



VLeb{C) 



lc(z), 



su P P e y s (x) < V^z) - cy 1 ""^) + 61 c (x). 
6»eK 

For the smoothness we have 

Proposition 2.11. Assume that |Vlog7r(x)| is a bounded function. Let K be a compact 
convex subset of 6. There exists a finite constant C(K) such that for any f G L v p, 
(3 G [0,1], any 9,9' G K, 

<C(K)|/U0-0W(x). (21) 



ae(x,y)qe(x,y)f(y)dy - J a e >(x,y)q e >(x,y)f(y)dy 

Proof. See Section [3. 10.2L 

We now apply Theorem 12.31 to get a CLT for the adaptive MALA 



□ 



Theorem 2.12. Assume and (M with rj > d + 4. Let s G (6, 2 + 77 - d) and Ze£ 

/ : X — > R fre a measurable function such that ir(f) = and \f(x)\ < C(l + \x\ 2 ) b for 
some b G [0, | — 3) and some finite constant C . Then there exists a nonnegative random 
variable o~ 2 (f) such that n~ 1 / 2 X^fc=i fi-^-k) converges weakly to a random variable Z with 



characteristic function <p(t) = E exp (— a *^ t 2 
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Remark 3. If ir is positive and of class C 2 and tt(x) ~ (1 + \x\ 2 )~( d+u ^ 2 in the tails, then 
C(2] hold with 7] = v + d and Theorem 12.121 guarantees a CLT for v > 4. Compare with 
v > 2 for Harris recurrent Markov chains satisfying AQ3 



Proof. A[T]hold as a consequence of Proposition [2TT0l (see e.g. lAtchade and Fortl (|2008l ) Sec- 
tion 2.4 and Appendix A). Proposition 12.91 shows that 0T]-(2) hold and Proposition 12.111 
implies that E(2] hold. Therefore A[3] hold as a consequence of Proposition 12.81 (jll|) is 
an easy consequence of Proposition 12.111 and Proposition 13.51 We thus conclude with 
Corollary E31 □ 

In the above theorem the asymptotic variance cr 2 (f) takes values in the set {<r|(/), 9 £ 
L}, where L = {9 £ R : a(9) = a} and 



^e(f) = / <dx) | / 2 (x) + 2^/(*) J P*/( 

fc>0 



In particular, if L = and <r| (/) > 0, then n x / 2 X^fc=i f(-^k) converges weakly to 
A/-(0, <(/)). 



3. Proofs 

The proofs are organized as follows. The weak law of large numbers (Theorem 12. ip is 
proved in Section 13.51 the CLT (Theorem I2.3P is proved in Section 13.61 In Section 13.11 we 
develop some preliminary results on the resolvent functions g a and we establish some basic 
results on the asymptotic behavior of the nonhomogeneous process {(X n ,9 n ), n > 0} in 
Section 13.2113.31 The results in Section 13.41 (in particular Lemma 13. 12[) serve as a link and 
allow us to reduce the limiting behavior of the adaptive algorithm {(X n ,9 n , u n , £ n ), n > 0} 
to that of the nonhomogeneous Markov chain {(X n , 6 n ), n > 0}. 

Throughout the proof, C(K) denotes a finite constant that depends on the compact set 
K and on the constants in the above assumptions. But to simplify the notations, we will 
not keep track of these constants so the actual value of C(K) might be different from one 
appearance to the next. 

3.1. Resolvent kernels and approximate Poisson's equations. In this section, K is 
a given compact subset of and & [0, 1 — a] . We consider a family of functions fg G Lyp , 
9 £ Q such that ir(fe) = 0. For a € (0, 1) we define the resolvent function associated with 
fe as 

oo oo 

g a (x,6) = ]T(1 - ay +1 Plf e {x) = £(1 - ay +1 Pjf (x), 

3=0 3=0 
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where Pq = Pq — n. Similarly we define 

oo oo 

~g(x,9) =J2 p ife(x) =^2Pife(x), 

3=0 J=0 

When fg = f does not depend on 6 G B, and to help keep the notation clear, we write 
g a (x,6) (resp. g(x,9)) instead of g a (x,9) (resp. g). It is easy to see that when g a is well 
defined, it satisfies the following approximate Poisson equation 

f e (x) = (1 - a)~ x g a {x, 0) - P e g a (x, 0). (22) 

Similarly g, when well-defined, satisfies the Poisson equation 

f e (x) = ~g(x,d)-P e g(x,d). (23) 

We introduce the function 

C.(a)= f £(l-a) J+1 (l+j)- K . 

j>o 

We will need the following lemma. 

Lemma 3.1. For any a G (0, 1/2] and k > 0, 

Cn(a) < I -log(2o) + l i/« = l 

[ 2- 1+K r(l - n)a~ l+K if < n < 1 

where T(x) := / °° u x ~ x e~ u du is the Gamma function. 

Proof. (1 - a)* < 1 for all j > 1. Therefore, for k > 1, £ i>0 (l ~ a) j+1 {l + j)~ K < 
T,j>i3~ K - For k = 1, we note that ^ {E j >o( 1 ~ + i)~ K } = -a -1 . Therefore 

for a G (0,1/2], £-> (l - a)' +1 (l + j)"" = E^iO'^')" 1 " log(2a) < -log(2a) + 1. 
Finally, if < k < 1, by monotonicity, Ylj>oO- — a) J+1 (l + j)~ K < / °°(1 — a) x x~ K dx = 
J °° x 1 -"- 1 e-# B dz = r(l-K)/?- 1+K , where /? = - log(l-a). For a G (0, 1/2], - log(l-a) < 
2a and we conclude that Ej>o( 1 ~ + j)~ K < 2- 1+ T(l - k)cl- 1+k . □ 

Proposition 3.2. Assume 421 

(i) : Let k G [0, a _1 (l — j3) — 1] . There exists a finite constant C '(K) such that for any 
(x, 6) G X x K and any a G (0, 1/2] 

|£aM)| < C(K)|/ e | y/3 CK(a)^ +aK (x). (24) 

(ii) : Suppose that a < 1/2. Lei k G (1, a (1 — /?) — 1]. There exists a finite constant 
C(K) snc/i i/iaf /or any G X x K and any a G (0, 1/2] 

< C(K)|/e|^ J\.-i{u)d^j V? +aK (x). (25) 
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Proof, (i) is a direct consequence of ©. 

To prove (ii), we note the identity 1 — (1 — a) J+1 = (J + 1) J a (l — uydu and then write 



,r 



9) - ~g a (x, 0)|<£(1-(1- \Hfe{* 

< c(K)|/ e | y/3 y^(x) V f (1 - u)'d«(i +j) 



(in 



<C(K)|/ e | y/3 ^ +aK ( 3 ;)2 1 - K / C«-l(«)<* 



o 

Since At > 1 and a > 0, the interchange of the summation and integral signs is permitted. 

□ 

Remark 4. One can check using Lemma 13.11 that for k > 1, J a ( K -i(u)du — > as a — > 0. 
Hence a direct consequence of Proposition ET21 is that for any /3 G [0, 1 — 2a) (a < 1/2), any 
k G (1, a _1 (l — /?) — 1], there exists a finite constant C(K) such that for any (x, 9) G X x K, 

|^,0)|<C(K)|/,| y/3 ^ +QK (x). (26) 

Proposition 3.3. Assume 421 

(i) : For am/ At, 5 > urai/i At + 5 < a _1 (l — /3) — 1, there exists a finite constant C(K) 
such that for any 9, 9' G K, x G X and a G (0, 1/2] 

|SaM) < C(K)sup|M^a(o) (&(o)iW(M') + - fff\ V p) V^ K+S \x). 

6»eK 

(ii) : Assume a < 1/2. For any (3 G [0, 1 — 2a), any At > 0, 5 > 1 luzf/i At + 5 < 
a _1 (l — (3) — 1, There exist a finite constant C(K) such that for any x G X, 
9,9' G K and any a G (0, 1/2] 

\~g(x,9)-~g(x,9')\ < C(K)sup|/ fl | v/9 ( f° C«-i(u)<*« + ( K (a)\f e - f e >\ v0 

+Ua)D (]+a& {9,9')) V^ +& \x). 

Proof. We have 

i>o j>o 
By Proposition 13.21 (i) we bound the second term in the rhs as follows. 

Y J {l-ay +1 Pl,(fe{x)-h{x)) : 
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The first term in the rhs can be rewritten as 



j-1 

j>0 j>l 1=0 

From © of Ag]with k = 5, we have |Pj,/e(x)| < C(K) sup 0eK + O^V^^x) for 

all I > 0. Combined with the definition of Dp +a s, we get 

(p e - P e ,)Pir l - l fe{x) < c(K)su P - i)- 5 Dp +a5 (e,e')vP+ a5 (x). 

eeK 

Another application of All]-© then yields for any k£ [0, a~ l (l — /?) — 1 — 5] 



P^ e (x) - < C(K) sup |M^^ +a5 (0, V(l + - I)- 5 . 

eeK 



1=0 



It follows that 



J>-ap +1 (Pif e (x)-Pi,fe(* 



j'-i 



< C(K) sup IMy/^+a^, 0')^ +Q(K+5) (x) ^(1 - + _,s (i - 0" 



9eK 



3>1 



£=0 



< C(K) sup |M^C K (a)a(a) J D/3+ a 5(^^)^ /3+a(K+<5) (^)- 
6>eK 

Combining this with (|27p gives part (i). 

To prove (ii), we write \g(x,8) - g(x,0')\ < \g a (x,0) - g(x,0)\ + \g a (x,9) - g a (x,8')\ + 
\g a {x,9') -g(x,6')\. Part (i) gives 

\g a (x,6)-g a (x,6')\ < C(K) sup \fe\ V f>CM (^W(M') + \fg - f e ,\ vP ) V^ s+K \x). 

eeK 

Then we use 5 > 1 and Part (ii) of Proposition [24"l to get 

+ \~g a (x,6>)-~g(x,e')\ < C(K) sup \fg\ v , f ( 5 -i(u)duVP +aS (x). 

eeK Jo 



The conclusion follows. 



□ 



3.2. Modulated moments. In this section, K is an arbitrary compact subset of O, 
(x,9) G X x K and I > an integer. We consider the nonhomogeneous Markov chain 
{(X n , 6 n ), n > 0} with initial distribution 5 x g and transition kernels Pi (n; {x\, 0\), (dx' , d8')) 
R(l + n; (xi,8i), (dx' , d9')). Its distribution and expectation operator are denoted respec- 
tively by and E^g. The key property that we will use here is ^ which, as we have 
seen, is a consequence o f (HI). The first tw o prop ositions below are easy modifications of 
similar results proved in lAtchade and Fort ( 2008 ). 



Proposition 3.4. Assume There exists a finite constant C(K) such that for any 

(x,9) G X x K, l,n > 1, 
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(i) for any < /3 < 1, 



(ii) for any < (3 < 1 — a 



E 



x,e 



lk=l 



< C(K)n V 0+a (x) . 



Proposition 3.5. Assume ^4[IJ Let {r n , n > 0} 6e a non-increasing sequence of positive 
numbers. For (5 G [0, 1 — a], i/iere exists a ./rniie constant C(K) swc/i i/iai /or any (x, #) € 
X x K, 1< n < JV 



E 



(0 



N-l 



k=n 



1} 



N 



< C(K) ( r„E« (V^(X n )l {7K>n _ 1} ) + £r fc+1 



k=n 



The next proposition gives a general standard bound on moments of martingales as a 
consequence of the Burkholder's inequality. 

Proposition 3.6. Let M n = ^fc=i-^fc> n > 1 be a martingale such that E(|L>fc| p ) < oo 
for some p > 1. Then 



E[\M n \ p ] < Cn m ^ 1 ' p ^- 1 Y J ^(.\D k \ p ) , 

k=i 

for C = (18pg 1 / 2 ) p , p' 1 + q~ l = 1. 

3.3. A Weak law of large numbers. We fix / > integer, K a compact subset of 
and (x,6) £ X x K. This section deals with the weak law of large numbers for the non- 
homogeneous Markov chain {(X n ,9 n ), n > 0} with initial distribution 8 X ^ and transition 
kernels Pi (n; (a*, Ox), (da/, d0')) = R (I + n; (a?i, 0i), (da/, d0'))- 

Proposition 3.7. Assume ^4[7J Ze£ /? G [0, 1 — a) and G £y/3 a c/ass of functions such 
that 6 — > is a measurable map, n(fo) = and sup egK |/e|y/3 < oo. Suppose also that 

there exist e > 0, k > such that (3 + an < 1 — a and 



E 



(0 



£ W*>*~ 1+< (^A-i) + \f §k ~ fe k Jv?) V P+aK (X k ) 



k>l 



< co. (28) 



Then n >n | Sfc=i /e fc jO-^fc) converges to zero in P ^-probability. 
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Proof. Define H a ^(x,y) = g a (y,6) - P e g a (x,9) and S n = YJk=i 1 {7 K> k-i}^e k -S^- Note 
that !,«- x n J2k=i fa (Xk) = l/* - ^ i^ -1 ^. Then we use ([22]) to re-write S n as: 



fe=i fc=i 

+ (Pflbflf«n(^0,e ) - l {rK>n} P e -^ an (X n ,0 n ,)) 

n n 



fc=l fc=l 

We take a n oc n _p € (0, 1/2] where p > is such that p(l — k) < min (0.5, a, 1 — p _1 ) 
where p = (1 — a)(/3 + a/t)" 1 > 1; and p(2 — k) < e, where k and e are as in ([25]) . First, 
we notice that 



{-r K >n}Z^ {r K =fc} Si_i» fl nV fc 1 / 



fc=l 

Then we consider the term M„ ifc d = lr^ K>fc _ 1 -i-H 0n) e J -_ 1 (Xj-uXj)- Clearly, {(M n>k , T k )} 

is a martingale array. Applying Proposition 13. 21 and Proposition ^. 61 (with p = (1 — a)/(/3+ 
an) > 1), we get 



{r K >fe-l} 

a-=i 



( 



PP(l-K) T) max(l,p/2) > \ 



By the choice of p, p(l — k) + max(0.5,p x ) < 1 and we conclude that M H;n /n converges 
in LP to zero. 

Define = ((1 - a^" 1 - l) ELi Vk^-I}^**'^- 1 )' Pro P osition (i) im- 
plies that 



E 



(0 



n 



The rhs converges to zero since a n — > and k > 0. 

We turn to i?i 2) d = P^jXo, 6> ) - l^ K>n} Pg n g an (X n ,9 n ). Again, by Proposition E21 
(i), the drift condition in A[2l and Proposition 13.41 (i) 

<i < Cn-V^Eg, (V+ QK (X ) + l { ^ K>n} ^ +aK (X n )) 

= O (n~ 1+ P +aK a- l+K ) = O (n- a+ P^- K A . 

Given the assumption p(l — k) < a, it follows that n~ 1 Rn^ converges in probability to 
zero. 
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We finally turn to i# d ^ f £Li ^ K >k} ( P § k 9an(XkA) ~ P^a^k, 9 k -i)) ■ By 
definition, P e g a (x,6) - P e/ g a (x,9') = fo{x) - f 6 {x) + (1 - a)~ l {g a {x,6) - g a (x,6')). By 
Proposition 13.31 (with 5 = 0) we have: 

\P99a n (x,0) - Pg>g an (x,e')\ < C(K)sup |/ e |^ 2+K (^(MO + - /e'M 

< C(K)sup|/ e | vfl n e (^(M') + |/e - /Hy?) V^ +QK (x) 

Therefore Kronecker's lemma and (|28p implies that n _1 i?^ converge almost surely to 
zero. □ 

The next result will be useful in proving the central limit theorem. We take / € Lyp 
and let g a be the resolvent associated with / and H a fi{x,y) := g a (y,0) — Pgg a (x,8). We 
will show in the next lemma that n -1 / 2 !^ > , X^fc=i f( x k) behaves like the martingale 
array n -1 / 2 Ylk=i > k~i} P a n e k ^-Xk-it X k ) as n ^ oo for some well chosen sequence 
{a n , n > 0}. 

Lemma 3.8. Assume AH\ with a < 1/2 and Ze£ K a compact subset of 0. Let /3 > snc/i 
iaa£ 2(/3 + a) < 1 and / € £y/3 suc/i i/iai 7r(/) = 0. Let k > 1, <J € (0, 1) 6e sac/i £aa£ 
2/3 + a(n + 5) < 1 — a. Take p G (1/2, 1/(2 — 5)] and let {a n , n > 0} be a sequence of 
positive numbers such that a n € (0, 1/2], a n oc n~ p ' . Suppose that 



■a 



fc>l 



< oo. (29) 



■m 



For any s > 0, n 1/2 l { ^ K>n} ££=i (/(Xjfe) - i? an+s ^ (X fc _i, X fc )J converges to zero i 
F^'g-probability. 

Proof. Without any loss of generality, we assume that k also satisfies j3 + an < 1/2. For 

s > arbitrary, define = Efe=i 1 { V K >j fc _i} (/TO ~ H a n+ J h _S X k-i, x k)) ■ Note 
that 

n 

lr- 1 re ~ 1/2 V (/TO --H", fl (Xfc-i,^)") = l r ^- ,n~ 1/2 S n . 

{r K >n} Z—f V an+ 3 ,ffc-i v K ^'y {r K >n) n 

k=l 

Then we use the approximate Poisson equation (I22p to re-write S nyS as: 



S n , s = {(l-a n+s r 1 -l)Y,^ { - K>k _ 1} 9a n+s ( X k,0k-i) 

k=l 

+ (P Bo g an+s { X vM ~ 1 { 'r K >n} P 9 n 9^ + s( X n,0n) 
n 

+ 5Z 1 {r K >fc} ( P O k 9an+ a ( X k,0k) ~ P§ h _ i ga rl + t (Xk,9k-l] 

k=l 

n 

+ ^l2 1 {7 K =k} P e k - 1 9a n+ s( x k,0k-i)- 



k=l 
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Notice that 1 { - K>n} Efc=i 1 {r K =k} P e k -i 9a ™+° ^h, h-l) = °- For the rest > consider 
Rn^ = f [Pe o 9a n+3 (X O ,0 O ) - l^ K>n yPg n g an+a (X n ,§ n )j. By Proposition E21 the choice 
k, > 1 , and by Proposition 13,41 (i) we have 

Eg, (|i#)|) < C(K)E® + ^ + -(X„)l {7K>n} ) = O (n" +a «) . 

Since (3 + as < 1/2 we deduce that n~ 1 / 2 R}^ — > in probability. 

Now take i?i 2) d = (l - a n+s )~ l - l) ££ =1 l/^ K>fe _n5a„ +s {X k , k -i)- We can apply 
Proposition 13.21 to obtain 

n 

\R^\<C(K)a n+s Y,l { - K>k _ 1} V^ +aK (X k ) 

k=l 

and by Proposition 13.41 (ii), (^n~ 1 /' 2 \R^\^ = O (n 1 / 2 ^). By assumption a n oc n~ p 
with p > 1/2, thus n _1 / 2 i4 2) converges in probability to zero. 

Finally, we consider i2® d = ££ =1 1 {7 K>fc } ( P e k 9a n +s ( x k, h) - Pg^ga^+A^k, • 
By definition, 

P^aM) -P0,<7 a (x,0O = (1 -a^SaM) - 9a(x,9')), 

and by Proposition 13.21 applied with n > 1 and <5 > 0, |P6»<?a(^5 0) — Pe'9a{x, 0')\ < 
C(K)C 5 (a)^+a < 5(^^ / )^ /3+a(K+ ' 5) (^) so that 

n 

fc=i 

By assumption n _1 / 2 n p ( 1_5 ^ = o(n~ 1+p ( 2 ~^). Kronecker's lemma and f)29|) then gives that 
n -1 / 2 /?^ converges to with probability one. □ 

3.4. Connection with the adaptive MCMC process. In this section we give a num- 
ber of results that connects the non-homogeneous Markov chain {(X n ,9 n ), n > 0} with 
the adaptive MCMC process {(X n , 6 n , u n , £ n ), n > 0} defined in Section [221 This will 
allow us to transfer the limit results established above to the adaptive chain. 
We introduce the sequence of stopping times associated with the adaptive chain 

T = T j+1 = ini{k > Tj, £fc = 0}, k > 1, 

with the convention that inf = oo. Also define 

dcf 

Z^oo = supz/fc. 
fc>0 

Lemma 3.9. // M hold then P (T Voo < oo) = 1. 
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Proof. A[2] states that F(u 00 <oo) = l. Thus under A[2] 
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j>o 



i) = o, 



the last equality follows from the fact that on the set {Tj = +00}, sup^>Q v\. < j — 1. 
Hence, P (T^ < +00) = 1. □ 



The following is Lemma 4.1 of 



Andrieu et al. 



(I2005h . 



Proposition 3.10. For an?/ n £ N, any n-uplet (t±,--- ,t n ), any bounded measurable 
functions {fk, k < n} and for any (x, 9,j) € X x Kj x N, 



E 



x,e,j,o 



J fk(X tk ,6t k )t{ Tl> t, 



E 



U) 



f[f k (x tk ,e tk )t { ^ K 



,k=l 



>tn} 



One can obtain the finiteness of moments of the adaptive chain as in the following 
lemma. 

Lemma 3.11. Let W n = W(X n ,9 n ,X n+ i) be a sequence of random variables such that 
for all l,k < n, 



„(0 .. 



sup E% 



O,6>)eX xe 

Then E (W(X n , 6 n , X n+ i)) is finite. 



W k l 



{TK,>k} 



< OO. 



Proof. Denote W n = W(X n , 9 n , X n+ i). We have 

n n 11 n 

®m = EE 1 Kv=;} v,-}] = EE 1 K^^w^m} 

j=0 s=j j=0 s=j 

n n 

= [ 1 {rj=s}^X s ,6» s ,j,0 {W(X n - s ,9 n _ Sl X n+ i_ s )l{ Tl>n ^ s y] 
j=0 s=j 



EE* 

j=0 s=j 



U) 



j=0 s=j 



The last equality uses Proposition 13.101 



00. 



□ 



In very general terms, the next result shows that a weak law of large numbers for the 
re-projection free process {(X n ,6 n ), n > 0} implies a similar result from the adaptive 
chain. 

Lemma 3.12. Assume Let {W n k, 1 < k < n} be a triangular array of random 

variables of the form W n> k = W n (9k-i,Xk-i,9k,Xk) for some measurable functions W n : 
OxXxOxX-tR. Let {b n , n > 1} a non-increasing sequence of positive number with 
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lim-n^oo b n = 0. Suppose that for any k > 1, sup n>1 \W n (9 k -i,X k -i,9 k ,X k )\ < 00 P-ffl-s- 
and /or a// I > 0, s > 0, (x, 6) 6 X x K; and 5 > 



lim P 



(/) 



Wn+s,k 



k=l 



> S 



0, 



then b n ^2 k= i W n (9k-i, Xk-i, 9 k , Xk) converges to zero in P -probability as n — > oo. 



Proof . The idea of the proof is similar to the proof of Proposition 6 of 



Andrieu and Moulines 



f|2006l ). Write W n , k = W n (9 k -i,X k - U 9 k ,X k ). As shown above AE1 implies that T Voo is 



finite P-a.s. With the convention that ^ a • = if a > b, we write ; 

n n/\T Uoo n 

b n ^2W njk = b n W n>k + b n Y w n,k 

k=l k=l k=T Voo +l 

= b n sp+b n s®. 



where Sfp = Ylk'=i°° W n ,k and S n z> = 12k=T Uoo +i W n,k- Since sup n >! |W n ,fc| and T Uoo are 

(1) T 

finite P-a.s., we deduce that \S n \ < ^fc=°i su Pn>i |Wn,fc| is also finite P-a.s. Therefore 
b n Sn^ converges to zero P-a.s.. 

Take e > 0. From Lemma 13.91 we can find L2 > L\ > such that P \voo > L\] + 
P [T Voo > L2] < e. For any 5 > and n > L2, we have: 

Li L 2 

p [b n \sW I > s\ < £ E ^ [ 6 ™l 5 « 2) I > = 5,1/00 = i 

Z=0 s=0 

We then observe that the event 

[b n \SW\ >S,T t = 8,1^00 = /} C < 

Therefore by conditioning on Tt x , we get: 
F\b n \S^\> 6,^ = 8,^ = 1^ 

/ n—s 



>nAT„, 



j(2) 



+ e. 



T,+n-s 



&n-s| E W njk \t{ Tl+1>T[+n _ s y > 5 
k=Tr+l 





n—s 








l{ri>n-s} 




k=l 





E 



x T{ ,e Tl 



b n - 



fc=l 



{r K; >?t-s} 



The last equality follows from Proposition 13.101 By assumption, the inner term in the last 
expectation above converges almost surely to zero. It follows from Lebesgue's dominated 
convergence theorem that lim n ^ 00 P (b n \Sn^ | > oj < e. Since e > is arbitrary, the 
results follows. □ 
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2:-! 



3.5. Proof of Theorem 12.11 Since A[T]and ([8]) hold, we can apply Proposition 13. 71 which 
implies that 



lim 



»(Q 



fc=i 



> 5 



0. 



for any 5 > 0, I > and (x, 9) G X x K^. Theorem 12. II then follows from Lemma 13.121 



3.6. Proof of Theorem 12.31 Throughout the proof, we take k > 1, 6 E (0,1), p G 
(1/2,(2 — and {a n , n > 0} as in the statement of the theorem. Denote S n = 

^2k=i f(Xk)- Without any loss of generality, we will assume that \f\yfj < 1. We have 



fc=i 



S n - ^2 H anj e k _ 1 (X k _ 1 , X^l^^oy + ^2H an! e k _ 1 (X k ^ 1 ,X k )l {l : k _ i=0} 

k=i 

n 

+ Y,{f(Xk)-H an ,g k _ 1 (X k _ 1 ,X k )). 
k=l 

™~ 1/2 £fe=i (f(X k ) - Ha^e^X^Xk)) converges in IP-probability 



By Theorem! 
to zero. 

Note that £ k = signals a re-projection at time k. By Proposition 13.21 (i) applied with 
n > 1, 



^2 H °™fik-1 ( X k-l,X k )l {ik l ._ 



0} 



k=l 



< C(K V J £ {V'-^X,.,) + V x ~ a {X k )) , P - a.s. 



k=l 



and the rhs is finite P-a.s. We then conclude that n l / 2 YHz=\ ^a n ,e k _ 1 (^k-i^k)^{^ k _ 1 



fe-i=0} 



converges to zero P-a.s.. 



Define M n>fc = Ylj=i D n,j, where D n j = n 1/2 H ani g j _ 1 (Xj-i, Xj)t^._ i: ^ y. It is straight- 
forward to see that {(M nk , T k \ 1 < k < n} is a martingale array. We will show that 



{{M nk ,J- k ), 1 < k < n} is a square-integrable martingale array; 
lim ^1(^1^!) = ^(/), (in P-probab.) 



where 



^ 2 (/) = J <dx) {-f(x) + 2f(x)g(x, 0*)) , 
is finite P- almost surely and that for any e > 0, 

n 

lim J2 t ( D h 1 {\D n , j \>e}\^~l) = °> ( in P-probab.) 



k=l 



(30) 
(31) 



(32) 



(33) 



By the central limit theorem for martingales (see e.g. lHall and Hevdd (|l980l ). Corollary 
3.1), d30j)-(l33]) implies that M n , n converges weakly to Z (M nn — ► Z) where Z is a random 
variable with characteristic function 4>(t) = E ( e~ 2°"2W 2 j _ This will end the proof. 
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Proof of G30D . It suffices to show that for all I > 0, k,n > 1, 



sup E®Jh* s (X fc _i,^ fc )l. v _ n > ) 



< oo 



(3 

and to apply Lemma 13.111 By Proposition 13.21 (i) (applied with both k > 1 and <5 > 0), 

snp eeK \g a {x,6)\ 2 < C(K)C«5(a)F 2/3+Q ( K+,5 )(a;) < C , (K)C 5 (a)^ 1 ~ Q (x) since by assumption 
2(3 + o(k + 5) < 1 - a. Thus for any Z > 0, fc, n > 1 and (x, 9) G X x Kj, 

E 



il) n[H 2 - (X k -l,X k )l r <- , ^LFfc-l ) < l r - , nl Pfl fl 2 (Xfc_l,fi>fc_l) 

Prom Proposition 13.41 (i) we thus obtain 

sup E« (V - (X fc _!,X fc )l >fc n ) < C7(K0C«5(«n)^ a sup V 1 - ^) < oo. 



(x.^eXoxK; 
Proof of d31]) . 



n 



^Pg.^Hl^.^Xj-x) -l^._ 1=0} n 1 P 0j _ 1 Hl ni g._ i (X j _ 1 ] 
The same argument as above shows that 



n 

— 1 ' 

n 



3=1 3=1 

which converges almost surely to zero since T Uao is finite P-almost surely, (s(a n ) = 0(n p ^- 5 ^) 
and p(l -5) < 1/2. 

For the first term, we note that P g H^ e (x, 9) = P e gl(x, 9) - (P 9 g a (x, 9)f = P e g 2 a {x, 9) - 
((1 — a)~ 1 g a (x,9) — /(x)) 2 . We thus have the decomposition: 

n 6 . 

-E P ^-i<A-i(^-i) = -H T n+ < dx ) {-f\x) + 2f(x)g(x,9+)) , 

k=l i=l J 

where 



T n^ - Pe h -i9'L.( x k-i, Ok-i) - gl n ( x k-i,9k-i), 
*;=i 

n 

= (1 - (1 - a n )- 2 ) ^^(X^!,^), 
fc=i 

n 

= 2 ((1 - a^" 1 - 

fe=l 

n 

r< 4) = 2 ]T f(X k _ x ) (g an (Xk-i,9 k -i) - 5(^-1 A-0) 



k=l 
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n „ 

T i 5)=2 E / < dx )fi x ) (jg(x,0 k .. 1 )-g(x,0 ir )). 
k=i 

n r /■ 
T i 6) = E -/ 2 (^i) + 2/(^i)fl(^i,^i)- / <ds)(-/ 2 (z) + 2/(x)<7(xA-i)) . 
k=i L ^ 

By assumption n _1 T,i converges in P-probability to zero. We will use the same tech- 
nique to study the term Tn to T^ 5 \ For example for Tn \ the idea is to introduce its 
counterpart Tn} in the space of the re-projection free process {(X n ,8 n ), n > 0}, to show 
that lim^oo P® $ (\f$ \ > S\ = for any I > 0, S > and any (x, 9) G X x 6; and then 
to argue that lrm n _ ) . 00 P (|T^ | > S ) =0 for all 5 > using Lemma 13.121 



Lemma 3.13. n 1 +T^\ converges in probability to zero. 

Proof. For l,s > 0, define 

n 

Tn,s = (1 - (1 - a n+s )- 2 ) 1 { - k >n} JX +s (* fe -iA-i) 

1 k=i 

n 

+ ((1 - 0„ +a )- 1 - 1) 1 { - >n} ^/(^-l)^^-!,^-!). 



We show that for any /i > 0, and any (x,6) G X x K|, lim^-^ P^ ^n _1 |T njS | > = 0. 
Then we can apply Lemma 13.121 to conclude that n~ 1 T^ converges in P-probability to 
zero. As above, for any (x, 9) G X x K; and by Proposition 13.21 (i), we get 

C(K,) (Cs(a n+S ) + 1) a n+s E% ^ ^ { - Ki> ^ 1} V 2(3+a{K+8) (X h fj = O (na S n ) . 
The rest of the proof follows from the usual bounds on the ^-moments. □ 
Lemma 3.14. rc — 1 Tn^ converges in probability to zero. 
Proof. For l,s > 0, define 

n 

T$ = Vk >n}E/(^-l) (<7a„ +s (*fc-lA-l) ~ g{X k _ X) ~e k -l)) 
1 k=l 

n 

= V^EVk,^}/^) (sa n+s (X fc -l,4-l) -</(**-! A- 

fc=l 

Again, for any (x,0) G X x K/ and by Proposition 13.21 (ii) we get 

/ n \ 



{l) e (n- 1 ^) < (7(K z )a n+s CK-i(an +1 )n- 1 E« ( £ 1 {7k >k _ 1} V 2 " + ™ \X k ) \ = O (a n C«-iK)) . 
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The rest of the proof is similar to the above upon noticing that for k > 1, aQ K ^\(a) — ► 
as a — > 0. □ 

— 1 (5) 

Lemma 3.15. n Tn converges P -almost surely to zero. 

Proof. By Proposition 13.31 (ii), there exists a finite constant C(K) such that for any 9, 9' G 
K,ieX and any a G (0, 1/2] 

|<Ks,0) < C(K) (aCc-i(o) + a" 1 i}, +a „(M')) ^ +aK (x). 

Therefore 



7r(dx)/(x) {g{x,0) - g{x,&)) 



< C(K) (aC«_i(a) + a^Dp+^iO, 9')) vr (V 2 ^+ c 



Let e > 0. Since a£ K _i(a) — > as a — > 0, we can find ao G (0, 1/2] such that aoC«-l( a o) < e. 
Then for P-almost every sample path 



lim 

n— >oo 



□ 



Ti{dx)f{x) \ g(x,e n ) -g{x,9*)^ 

< C(K^)Jim (c + ^/W^A)) vr (y^+^j = eC^jTT (F 2/3+c 
Since e > is arbitrary and 7r fy 2 ^ +aK ) < oo, we are finished. 
Lemma 3.16. re - 1 3n converges in probability to zero. 

Proof. We would like to apply the law of large number (Theorem 12 .2p to show that 
converges to zero. By Proposition 13.21 (ii), for any compact subset K of G, supgg^ I/ 2 + 
2j 'ge\y20+a. K < oo and 2/3 + an < 1 — a. To check ©, it is enough to find e > such that 



E 



(0 



£ k ~ 1+e " /% Iv*h~ V Ki >k] V 2 ^ +&) (X k ) 
k>l 



< oo. 



(34) 



But by Proposition 13.31 (ii). there exists a finite constant C(K) such that for any 9,9' G K, 
ieX and any a G (0, 1/2] 

- /(•)<?(•, #')U+- < C(K)aa_i(a) + a^Dp+^iO, 9'). 
We let o depend on k by taking a = a k , therefore 



E 



(0 



fc>i 



< E 



(0 



fc>l 



+ E 



(0 



fc>i 
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We can then find e > such that n e a n ( K -i(a n ) + n~ l+t a~ 1 = 0(n~ e ) and (J33J) follows. □ 
Proof of G33D . It is suffices to show that 



n „ 
fe=l J 



in P-probability. We will do so by applying Lemm a 13.121 again. By a lemma due to 
Dvoretzky (Lemma 9 of lAndrieu and Moulined (|2006l )) 



where 

= y'- p e fc _ 1 (^fc-i,rfy)5'a n (y' 6l fc-i)l{| !?Qn ( y ,e fe _ 1 )|> ev ^/2}- 
It is thus enough to show that for any s, I > 0, any (x, 0) £ Xo x Kj, 

n 

Kmn" 1 E Vk >fc-i}^»+-.* = °' ( in ^-Probability). 
fc=i ' 

Take p > 2 such that + a/2) < 1 — a. Then 



<(2/£)-"(n + S )-^E^ % 



{r K; >fc-l} 
P/2 F (0 



{r K; >fc-l} 



ga n+s {Xk,0k-l 



{|9a n+s (X fe ,0 fc _ 1 )|>e A /i+i/2} 



< (2/e)-^(K / )n^/ 2 (C 1/2 (a n )) p E« ( 1 ^ >fe V 1 ^,,) 



{r K; >fe-l} 



It follows that 



, < , .(Ev« I >^.)' i '«')= ("" (M,! )- 



and since p < 1, we are done. 
3.7. Proof of Proposition [2741 

Proof. Denote g a {x) = Ej>o( 1 ~ a) j+1 P j f(x), H a (x, y) = g a (y) - Pg a (x) and write g and 
H respectively when a = 0. Denote L 2 (irx P) the L 2 -space with respect to the joint measure 
ir(dx)P(x,dy) on X x X. It is shown by iMaxwell and Woodroofd (|2000i ) (Proposition 1) 
that if / € L 2 (tt) and Ylj>i i _1 ^ 2 |l- pJ /lli 2 (7r) < 00 tnen there exists iT* € L 2 (itxP) such 
that lim a ^ H-Ha — -H*||l 2 (7txP) = 0- 

Under © and with / G L y/3 , /3 G [0,1/2 - a), ^j>i 3~ 1/2 \\P j fh^Tr) < °o and thus 
there exists H± G L 2 { ixxP) such that li m a ^o \\H a — H*\\ L 2( nxP \ = 0. Moreover irxP(H 2 ) = 



ir(f(2g — f)) (see e.g. iHolzmannl (120051 ) for a derivation of this formula). From Proposition 
ETJ(ii), we see that H± = H (vrxP-a.s.). Note that irxP(H 2 ) = it (Pg 2 - g 2 + f(2g - /)) 
and 7r(|/(2<7 — /)|) < oo by Proposition 13.21 (i) and the fact that -K(y l ~ a ) < oo. Thus it 
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follows from irxP(H 2 ) < oo and ttxP(H 2 ) = n(f(2g — /)) that Pg 2 — g 2 is 7r-integrable 
and ir(Pg 2 - g 2 ) = 0. 

On the other hand we have PH 2 (x) = Pg 2 {x) — (Pg a (x)) 2 = Pg 2 (x) — g 2 (x) + g 2 (x) - 
(Pg a {x)) 2 . Similarly PH 2 (x) = Pg 2 {x) - g 2 (x) + g 2 {x) - (Pg(x)) 2 . After some algebra 
we get 

(Pg 2 a (x) - g 2 a {x)) - (Pg 2 (x) - g 2 (x)) = PH 2 (x) - PH 2 (x) + 2 ((1 - a)" 1 - l) f(x)g a (x) 
+ 2f(x) (g a (x) - g(x)) - ((l - a)" 1 + l) ((1 - a)" 1 - l) g 2 a {x). 

We take k > 1 and 5 > such that 2(3 + a(n + S) < 1 — a and apply Proposition 13.21 to get 

\{Pg 2 a (x) -g 2 a (x)) - (Pg 2 (x)-g 2 (x))\ < \PH 2 (x) - PH 2 (x)\ + Ca s V 2 ^ +s \x), 

for some finite constant C. It follows that 

n(dx) | (Pg 2 a (x) - g 2 a (x)) - (Pg 2 (x) - g 2 {x)) \ < 



\H n - H\ 



, L2(7rxP) + 2\\H a - H |k2 (7rxP) ||ff || L2(7rxP) + Ca\(V^ a ). (35) 
Then we have 

n n 

n~ l Y J iL( X k)-P9l n {Xk)=n- 1 Y J 9 2 {X k )-Pg 2 {X k ) 

k=l k=l 

n 

+ n- l Y,9l n {X k )-Pg 2 an (X k ) - {g 2 (X k ) - Pg 2 (X k )) . 

k=l 

Since Tr(\g 2 — Pg 2 \) < oo and Tr(g 2 — Pg 2 ) = 0, the weak law of large numbers for Markov 
chains implies that n~ l ^TJ k= ig 2 {X k ) — Pg 2 (X k ) converges in probability to zero. And 



n _1 E 



Y,9l(Xk) ~ PeJ an {X k ) - g 2 {X k ) - Pg 2 (X k 

k=l 

< E [\gl n (X ) - P e +g 2 a jX ) - g 2 (X ) - Pg 2 (X )\] 
and the rhs converges to zero as a consequence of (|35p . □ 



3.8. Proof of Proposition 12.51 

Proof. Write 



^ Pe k ^ i g'i n .{ X k-li #fc-l) - gl n ( X k-l,6k-l) - ^2 - P fe _ifl'a n (^fc-1' - 9 2 a n { X ki®k-l) 

k=l 

n 

+ Y,9 2 a n (X k ,9 k . l )-g 2 an (X k ,e k ) + {g 2 an (X n ,9 n )-g 2 an (X ,9 )) . (36) 



k=i 



k=l 
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We first deal with the first term. For I, s > 0, Define 

n 

?2 = lr*r >n} ^\_ya n+s (Xk-lJk-l) - gl n+s (Xk,0k-l)- 



2') 



fe=l 



We show that for any // > 0, and any (x,#) £ X x K;, lim^ oo F^ (n 1 \T^}\ > fxj = 0. 
Then we can apply Lemma[3Z[2]to conclude that Ylk=i p e k -i9 2 a n (^fc-l A-l)-#a n {X k , k -\) 
converges in IP-probability to zero. We have 

n 

fP = l r ^ .Vu , „(p s g 2 (Xk-i,0k-i)-d (X k ,9k- 

n ' s { T K,> n }^-^ { T K,>k-l}\ 8 k _ 1 : Ja n+s \ K 1) K IJ Ua n+S \ Kl K 

k=l 



and E 



(0 



0. Let k > 1 



1 {r Ki >fc-l} { P h- 1 9a n+s ( X k-l^k-l) ~ 9a n+s ( X k,0k~l)) \^k-l 

such that 2(/3 + an) < 2((3 + a) + e. Set p = (2((3 + a) + e)(2(3 + 2aK)~ l . By Proposition 
13.61 we get 



r(0 



by assumption. Since p > 1, the result follows. 

We use the same strategy to deal with the second term on the rhs of (I36p . For I, s > 0, 
Define 



T ni = f 1 { 'r K >n} ^29l n+s (X k ,dk~i) ~ gl n+s (X k ,§ k ) 
1 k=l 



1 {r K >n}Z^ 1 {r K >fc-l} (^fcA-l) ~ 9a n+s (-Xfc, #fc) J ( 5a„ +s (-Xfc, 0fc-l) + 9a n+3 { X k 

1 k=l ' 



We apply Proposition 13.21 (i) with k = 5/2 to get sup^g/g^ \g a (x,6) + g a (x,9')\ < 
C{Ki)a~ l+5 / 2 VP +aS / 2 (x). This together with Proposition EH (i) (with k > 1 and 5/2 > 0) 



gives: 



f{2) 



< C(K,) (a /2 (a n+s )) 2 ^l {rK >fc _ 1} L> /3+Q(5/2 (0 fe _ 1 ,^)y 2 ^+ 5 )(X fe ) 



fc=i 



n _1 (C5/2( a n+s)) 2 = O (n~ 1+p ^ 2_<5 ^) then Kronecker's lemma and (llip implies that n~ 1 T^} 
converges in probability to zero. 

For the last term on the rhs of (136)) . define 

m = i. 



{r K >n} \ : >an+ 



9a n+s ( X n, Qn) ~ 9a n+s ( X 0: fa] 



Then with kq > 1 such that 2{(3 + a/to) < 1, we get the bound E® g (rT^fQ | ) < 

^{T K ;>n} 



C^ra^E^ ( y 2 ^+ QK o)(X n )l ^ w ) = 0(n 1 -*W+ aK *>). The rest of the proof is simi- 



lar to the above. 



□ 
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3.9. Proof of Proposition 12.81 

Proof. We will show that for any p > 0, n > 1, any compact subset K of O and any 5 > 0, 

(37) 



sup pW(C n>p (K)>(5)<B(n,p), 
O,0)ex o xe o 

where the bound B(n,p) satisfies linin^oo B(n,p) = for any p > and lim^oo B(n,p) = 
for any n > 1. This clearly implies (|15[) and (|16p and the result will follow from Proposition 
\2l\ We have 



C„ )P (K) < sup 1 <- 



>J} 



(1) 



+ sup 1,<- n 
l>n ^ TK> ^ 



(2) 



+j-ie j 



(38) 



We start with the second term on the rhs of (j38|) . By Doob's inequality and B21 for N > n, 



X '° X n<7<N < T K>i} 



(2) 



> 5 



JV 



N 



It follows that 



(2) 



j=n 



2 

P+3 



(39) 



To deal with the first term on the rhs of (|38p . we proceed as in the proof of Theorem [27TJ 
We consider the sequence {a n , n > 0} such that a n oc n~ p , a n G (0, 1/2] where p £ (0, 1) 
is as in the statement of the Proposition. For 1 < n < I and p > 0, we introduce the 
partial sum 

l 

def 



where Tg(x) = Tg(x) — /i(#). Under E(2l T# admits an approximate Poisson equation g a 
for any j > 1 and we have . (Xj) = (1 — aj)^ 1 g aj (Xj,8j) — Pg g aj (Xj,9j). Using this 
and following the same approach as in the proof of Theorem 12 .1\ we decompose S n i(p, K) 
as 

s njl ( P , K) = tS + 1$ + + r$ + r# + tS 
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where 



T n} = i { r K > l} E * { ( (1 - a ^ - x ) M^)- 



r(2) 



n(3) 
l n,l 



2-1 



3=n 



T n} = VkX} E Vk>j} (7P+J+1 - l P +j) ~9a J+1 



1-1 



,WJJ. 



j=n 



z-i 



t— i 



We deal with each of these terms using similar techniques as in the proofs of Theorem 12.11 
and Theorem 12.31 Some of the details are thus omitted. Let 5 > arbitrary. 
On Term T^ 1 , . Take k > 1 such that r] + a/t < 1 — a. Then Proposition 13.21 yields 
\g aj {Xj,ej)\ < C{K)V r i +aK (X,j) on {Bj € K}. Then by Markov's inequality we have 



\ Z>n 



r. 



(i) 



< <5 _1 C(K)y(x) [ Tn+pn 1 -"^ + £ 7p+j r ' | • (40) 
The last inequality uses Proposition 13.51 and Proposition 13.41 (i) . 

On Term T^j . Let e > 0, k > 1 such that e € (p, (1 — a)(rj + Ata)" 1 — 1). That is 
(1 + e) (77 + an) < 1 — a and e > p. Then 



\ l>n 



n(2) 



> 5 



< (2/^) 1+t E^ l { V K >.}T;+n + 5>£?1 { - >I} ^(^) 



1+e 



Z>n 



l+e 



< {2/5) 1 ^C(K)V{x) [ T^n 1 "" + £ 7$ ) • (41) 

j>n-l 
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On Term j . Take k > 1 and 5 > such that 2r]+a(K+5) < 1 — a and r)+a(K+5) < 1/2. 
By PropositionE/Jand ^\g a (x, 9) - g a (x, 9')\ < C(K) sup eeK |T e |y,&(a) \9-9'\ v^ + < K+5 \x) 
Then by Markov's inequality 



>n 



r (3) 



> S) < (1/«)E« El {?K>J ^(aj) |*g,(X if y i+1 )| V***<*">(X i+ i 

7 \i>n 



From E(2] and the structure of the algorithm we compute that 



(p) 



E 



It follows 



>n 



* 9j {X jt Y j+1 ) V^ +5 HX J+1 )\T 3 ) 1 { ^ k> . } < C(K)F 2 ^+ 5 )(1,)- 



r (3) 



> *) < (V^)C(K) ^7 P 2 + ^i^ 1+P ~ a + E^+i-i^j ( 42 ) 



On Term T^j . By Markov's inequality, 



>3 -p 

\ l>n 



>5)< (1/*)E« ( £ ( 7p+j - 7 P+J+ i) 1 {Vk - 

/ \3>n 

< (l/«)C(K)Eg ( £ ( 7p+j - 7 P+J+ i) 1 { - K> I 

< (l/S)C(K)V(x) (n 1 - a (7 P +n " 7 P+ n+l) + 7 P +n) • (43) 
On Term T^ 5 , . Take ft € (1, 2) such that rj + an < 1 — a. One can check as in Proposition 



that for any compact K \P e g a (x,9) - P e g a >(x,9)\ < C(K)\a - a'\a K ~ 2 V^ +aK {x). And 
for aj oc j~ p , \cij — Oj_i|a^~ 2 oc j~ 1 a^~ 1 = o(j _1 ). Hence, by Markov's inequality, we get: 



$ -P 
\ l>n 



T 



(5) 



it .1 



(44) 



On Term T f [ 6 ^ . Let k > 1 such that 2(r/ + an/2) < 1 — a. Consider the term Dj 



l { V K>j} Trt ^(^ +1) j )-P^i o .(X i ,%)j so that = l { ^ >l} E l 1= n D r We note 
that Dj is a martingale difference and by Doob's inequality we get: 



'3 BUP 
\ l>n 



n(6) 
"n,Z 



/ 7>n 



< (l/5) 2 C(K)y(x) [ 7 2 +n „ 1 n 1 ~ Q+P + ) . (45) 

3> n 



By combining (|3U|) - (}4"5]) and (fTH|) . we get (f3"7|) as claimed. 
3.10. Proof of the results of Section [270 



□ 
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3.10.1. Proof of Proposition [279\ The function a{9) is of class C l . Hence by Assumption 
CQ] and the Mean Value Theorem L = {# € R : a(9) = a} is not empty. It also follows 
from CQ]that the function 9 — > J® cosh(w)(a: — a{u))du is bounded from below; so we can 
find K\ such that w{9) = J® cosh(u)(a — a(u))du + K\ > 0. Moreover — a)w' (9) = 

— cosh(#)(a(#) — a) 2 < with equality iif 9 € L. By Sard's theorem w(L) has an empty 
interior. Again from CQ3 it follows that L is included in a bounded interval of M. and since 
\im.g_>± oa w(9) = oo, we can find Mq such that L C {9 € M : w(9) < Mo} and >Vm is 
bounded thus compact for any M > 0. 



3.10.2. Proof of Proposition PT7T1 A straightforward calculation using the boundedness of 
|Vlog7r(a;)| implies that for any 9 G K, 

9 



90 



log (a e (x,y)q e (x,y)) 



<C(K)(l + \y-x\ 2 ) 



for some finite constant C(K). It follows that 
9 



/ 



90 



{a e (x,y)qg(x,y)) f(y) 



dy<C(K)\f\ vf J {l + \y-x\ 2 )Vf(y)q e (x,y)dy. 



We do a change of variable y = b(x) + e e ^ 2 z, where b(x) = x + 0.5e e V log 7r(x) and using 
the boundedness of |Vlog7r(x)|, we get: 

d 



sup 



K J 



09 



{a e (x,y)q e (x,y)) f{y) 



dy<C(K)\f\ v ,Vf(x) I (l + \z\ 2 f s/2 g(z)dz, 



where g is the density of the mean zero (i-dimensional Gaussian distribution with covari- 
ance matrix Id- The stated result follows by an application of the Mean Value Theorem. 
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